A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition
نویسندگان
چکیده
Prosodic cues (namely, fundamental frequency, energy and duration) provide important information for speech. For a tonal language such as Chinese, fundamental frequency (F0) plays a critical role in characterizing tone as well, which is an essential phonemic feature. In this paper, we describe our work on duration and tone modeling for telephone-quality continuous Mandarin digits, and the application of these models to improve recognition. The duration modeling includes a speaking-rate normalization scheme. A novel F0 extraction algorithm is developed, and parameters based on orthonormal decomposition of the F0 contour are extracted for tone recognition. Context dependency is expressed by “tri-tone” models clustered into broad classes. A 20.0% error rate is achieved for four-tone classification. Over a baseline recognition performance of 5.1% word error rate, we achieve 31.4% error reduction with duration models, 23.5% error reduction with tone models, and 39.2% error reduction with duration and tone models combined.
منابع مشابه
A Study of Tones and Tempo in Continuous Mandarin Digit Strings and Their Application in Telephone Quality Speech Recognition1
Prosodic cues (namely, fundamental frequency, energy and duration) provide important information for speech. For a tonal language such as Chinese, fundamental frequency (F0) plays a critical role in characterizing tone as well, which is an essential phonemic feature. In this paper, we describe our work on duration and tone modeling for telephone-quality continuous Mandarin digits, and the appli...
متن کاملModeling Lexical Tones for Mandarin Large Vocabulary Continuous Speech Recognition
Modeling Lexical Tones for Mandarin Large Vocabulary Continuous Speech Recognition
متن کاملLarge vocabulary Mandarin speech recognition with different approaches in modeling tones
Large vocabulary continuous Mandarin speech recognition has been an important problem for speech recognition researchers for several reasons [1], [3]. First of all, it is a tonal language that requires special treatment for the modeling of tones. There are five tones in Mandarin which are necessary to disambiguate between confusable words. Secondly, the difficulty of entering Chinese by keyboar...
متن کاملConnected Digit Recognition Experiments with the OGI Toolkit's Neural Network and HMM-Based Recognizers
This paper describes a series of experiments that compare different approaches to training a speakerindependent continuous-speech digit recognizer using the CSLU Toolkit. Comparisons are made between the Hidden Markov Model (HMM) and Neural Network (NN) approaches. In addition, a description of the CSLU Toolkit research environment is given. The CSLU Toolkit is a research and development softwa...
متن کاملHigh-Order Hidden Markov Model and Application to Continuous Mandarin Digit Recognition
The duration and spectral dynamics of speech signal are modeled as the duration highorder hidden Markov model (DHO-HMM). Both the state transition probability and output observation probabilities depend not only on the current state but also several previous states. Recursive formulas have been derived for the calculation of the log-likelihood score of optimal partial paths. The high-order stat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998